Segmenting Hashtags and Analyzing Their Grammatical Structure
نویسندگان
چکیده
Originated as a label to mark specific tweets, hashtags are increasingly used to convey messages that people like to see in the trending hashtags list. Complex noun phrases and even sentences can be turned into a hashtag. Breaking hashtags into their words is a challenging task due to the irregular and compact nature of the language used in Twitter. In this study, we investigate feature-based machine learning and language model (LM) based approaches for hashtag segmentation. Our results show that LM alone is not successful at segmenting non-trivial hashtags. However, when the N-best LM-based segmentations are incorporated as features into the feature-based approach, along with context-based features proposed in this study, state-of-the-art results in hashtag segmentation are achieved. In addition, we provide an analysis of over two million distinct hashtags, auto-segmented by using our best configuration. The analysis reveals that half of all 60 million hashtag occurrences contain multiple words and 80% of sentiment is trapped inside multi-word hashtags, justifying the need for hashtag segmentation. Furthermore, we analyze the grammatical structure of hashtags by parsing them and observe that 77% of the hashtags are noun-based, whereas 11.9% are verb-based.
منابع مشابه
Segmenting Twitter Hashtags
Social Media Posts On Platforms Such As Twitter Or Instagram Use Hashtags, Which Are Author-Created Labels Representing Topics Or Themes, Toassist In Categorization Of Posts And Searches For Posts Of Interest. The Structural Analysis Of Hashtags Is Necessary As Precursor To Understandingtheir Meanings. This Paper Describes Our Work On Segmenting Nondelimited Strings Of Hashtag-Type English Text...
متن کاملTowards Deep Semantic Analysis of Hashtags
Hashtags are semantico-syntactic constructs used across various social networking and microblogging platforms to enable users to start a topic specific discussion or classify a post into a desired category. Segmenting and linking the entities present within the hashtags could therefore help in better understanding and extraction of information shared across the social media. However, due to lac...
متن کاملSegmenting Hashtags using Automatically Created Training Data
1. Hashtags increasingly used to convey the actual message in tweets. Phrases and sentences turned into a hashtag. 2. Word with sentiment may trap inside a multi-word hashtag 3. Noisy and compact nature of language leads to hashtags very difficult to segment; sometimes depends on context. eg. #together; “to get her” or “together”? 4. Can we use carefully auto-segmented hashtags for training? RE...
متن کاملExploring the Meaning behind Twitter Hashtags through Clustering
Social networks are generators of large amount of data produced by users, who are not limited with respect to the content of the information they exchange. The data generated can be a good indicator of trends and topic preferences among users. In our paper we focus on analyzing and representing hashtags by the corpus in which they appear. We cluster a large set of hashtags using K-means on map ...
متن کاملInvestigating Temporal Variations in the Twitter Hashtag Graph
The increasing amount of data shared everyday on social networking sites is a rich source of information about the online as well as the offline world. ”Hashtags”1 are a semi-structured data format which are easy to track, index and aggregate; yet they cover a very diverse range of use since anyone can create and share them. The co-occurrence of hashtags can signify the strength of association ...
متن کامل